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This paper considers an entropy-power inequality (EPI) of Costa and presents a natural vector generalization 
with a real positive semidefinite matrix parameter This new inequality is proved using a perturbation approach via a 
fundamental relationship between the derivative of mutual information and the minimum mean-square error (MMSE) 
estimate in linear vector Gaussian channels. As an application, a new extremal entropy inequality is derived from 
the generalized Costa EPI and then used to establish the secrecy capacity regions of the degraded vector Gaussian 
broadcast channel with layered confidential messages. 
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I. Introduction 



m 
O 

O ' In information theory, the entropy-power inequality (EPI) of Shannon [1] and Stam [2] has played key roles in the 



solution of several canonical network communication problems. Celebrated examples include Bergmans's solution 



^ [3] to the Gaussian broadcast channel problem, Leung- Yan-Cheong and Hellman's solution [4] to the Gaussian 
wire-tap channel problem, Ozarow's solution [5] to the Gaussian two-description problem, Oohama's solution [6] 
to the quadratic Gaussian CEO problem, and more recently Weingarten, Steinberg and Shamai's solution [7] to the 
multiple -input multiple-output Gaussian broadcast channel problem. 
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Let X and Z be two independent random n-vectors with densities in MJ^, where M denotes the set of real numbers. 
The classical EPI of Shannon [1] and Stam [2] can be written as 



exp 



-hCX + Z) 

n 



> exp 



n 



+ exp 



-h{Z) 

n 



(1) 



where /i(X) denotes the differential entropy of X. The equality holds if and only if X and Z are Gaussian and 
with proportional covariance matrices. 

In network information theory, most applications focus on the special case of ([T|) where one of the random 
vectors is fixed to be Gaussian. In this setting, the classical EPI of Shannon and Stam can be further strengthened 
as shown by Costa [8]. Let Z be a Gaussian random n-vector with a positive definite covariance matrix, and let a 
be a real scalar such that a G [0, 1]. Costa's EPI [8] can be written as 

2 



exp 



-/i(X+ VaZ) 
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> (1 — a) exp 



n 



-/i(X) 



+ a exp 
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-/i(X + Z) 



(2) 



for any random n-vector X independent of Z. The equality holds if and only if X is also Gaussian and with a 
covariance matrix proportional to that of Z's. 

Though not as widely known as the classical EPI of Shannon and Stam, Costa's EPI has found useful applications 
in deriving capacity bounds for the Gaussian interference channel [9] and the multiantenna fiat-fading channel [10]. 
The original proof of Costa's EPI provided in [8] was based on rather detailed calculations. Simplified proofs 
based on a Fisher information inequality [11] and a fundamental relationship between the derivative of mutual 
information and minimum mean-square error (MMSE) in linear Gaussian channels [12] can be found in [13] and 
[14], respectively. 

Note that Costa's EPI Q provides a strong relationship among the differential entropies of three random vectors: 
X, X + y/aTi and X + Z. To apply, the increments of X + y^Z and X + Z over X need to be Gaussian and have 
proportional covariance matrices. For some applications in network information theory (as we will see shortly), the 
proportionality requirement may turn out to be overly restrictive. A main contribution of this paper is to prove a 
natural generalization of Costa's EPI ([2]) by replacing the real scalar a with a positive semidefinite matrix parameter. 
The result is summarized in the following theorem. 

Theorem 1 ( Generalized Costa 's EPI): Let Z be a Gaussian random n-vector with a positive definite covariance 
matrix N, and let A be an n x n real symmetric matrix such that ^ A ^ I. Here, I denotes the n x n identity 
matrix, and denotes "less or equal to" in the positive semidefinite partial ordering between real symmetric 
matrices. Then, 



exp 
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-/i(X + AiZ) 



> II- A| 



exp 
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+ I A| " exp 
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-/i(X + Z) 
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for any random n-vector X independent of Z. The equality holds if Z is Gaussian and with a covariance matrix 
B such that B — AB and B + A2NA2 are proportional. 

Note that when A = al, the generalized Costa EPI ^ reduces to the original Costa EPI Q. On the other hand, 
when A is not a scaled identity, the covariance matrices of increments of X + A2Z and X + Z over X do not 
need to be proportional. As we will see, the ability to cope with a general matrix parameter makes the generalized 
Costa EPI more flexible and powerful than the original Costa EPI. 

A different but related generalization of Costa's EPI was considered by Payaro and Palomar [15], where they 



examined the concavity of the entropy-power exp 



^/i(A2X + Z) with respect to the matrix parameter A. This 
line of research was motivated by the observation that the original Costa EPI ^ is equivalent to the concavity of 
the entropy power exp [^h{^/a'X. + Z)] with respect to the scalar parameter a. Unlike the scalar case, Payaro and 



Palomar [15] showed that the entropy-power exp 



is in general not concave with respect to the 



fMA^X + Z) 

matrix parameter A. However, the concavity does hold when A is restricted to be diagonal [15]. 

In information theory, a main application of the EPI is to derive extremal entropy inequalities, which can then be 
used to solve network communication problems. In their work [16], Liu and Viswanath derived an extremal entropy 
inequality based on the classical EPI of Shannon [1] and Stam [2] and used it to establish the private message 
capacity region of the vector Gaussian broadcast channel via the Marton outer bound [17, Theorem 5]. In this paper, 
we will derive a new extremal entropy inequality based on the generalized Costa EPI and use it to characterize the 
secrecy capacity regions of the degraded vector Gaussian broadcast channel with layered confidential messages. 

The rest of the paper is organized as follows. In Section|lIl we summarize the main results of the paper, including a 
new extremal entropy inequality and its applications on the degraded vector Gaussian broadcast channel with layered 
confidential messages. In Section JIIJ we prove the generalized Costa EPI, following a perturbation approach via a 
fundamental relationship between the derivative of mutual information and MMSE estimate in linear vector Gaussian 
channels [18, Theorem 2]. In Section |IVl we derive the new extremal entropy inequality from the generalized Costa 
EPI. The coding theorems for the degraded vector Gaussian broadcast channel with layered confidential messages 
are proved in Section |V] and Section |Vll Finally, in Section IVIIl we conclude the paper with some remarks. 

II. Summary of Main Results 

The following notation will be used throughout the paper. A random vector is denoted with an upper-case letter 
(e.g., X), its realization is denoted with the corresponding lower-case letter (e.g., x), and its probability density 
function is denoted with p(x) = px(x). We use E[X] to denote the expectation of X. Thus, the covariance matrix 
of X is given by 

Cov(X) = E [(X - E[X])(X - E[X])T 



Given any jointly distributed random vectors (X, Y), the MMSE estimate of X from the observation Y is the 
conditional mean E[X|Y]. The MMSE (matrix) is given by: 

Cov(X|Y) = E [(X - E[X|Y])(X - E[X|Y])T 



A. A New Extremal Entropy Inequality 

The following extremal entropy inequaUty is a consequence of the generalized Costa EPI. 

Theorem 2: Let Z^, k = 0, . . . ,K, be a total of K + I Gaussian random n- vectors with positive definite 
covariance matrices N^, respectively. Assume that Ni ^ . . . ^ ^k- If there exists an n x n positive semidefinite 
matrix B* such that 

K 



J2 /^fc(B* + Nfc)-i + Ml = (B* + No)-^ + M2 



(4) 



k=l 



for some n x n positive semidefinite matrices Mi, M2 and S with 

B*Mi = 
and (S - B*)M2 = 



(5) 
(6) 



and real scalars ^fc > with X^^i iJ-k = 1> then 

K K 

f^kh(X + Zk\U)- h(X + Zo\U)<Y,Y |B* + Nfcl - - log |B* + No| (7) 

k=l k=l 

for any (X, U) independent of (Zq, . . . , Zr) such that E[XX"'"] ^ S. 

Note that (IH)-© are precisely the Karush-Kuhn-Tucker (KKT) conditions (see [7, Appendix IV] and [19, 
Section 5.2]) for the optimization program: 

K 

2 ' 2 



max 
o-<B-<s 



^^log|B + N,|--log|B + 



Nn 



.fc=l 



Therefore, ([7]) implies that a jointly Gaussian {U, X) such that for each U = u,'K has the same covariance matrix 
is an optimal solution to the optimization program: 

K 



max 
(C/,X) 



^/ifc/i(X + Zk\U) - h(X + Zo\U) 



.k=l 



where the maximization is over all (C/, X) independent of (Zg, . . . , Zk) such that E[XX^] ^ S. Note that when 
K = I, this is a special case of [16, Theorem 8] with = 1. 
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Fig. 1. Degraded vector Gaussian broadcast channel with layered confidential messages 



B. Applications on the Degraded Vector Gaussian Broadcast Channel with Layered Confidential Messages 
Consider the following vector Gaussian broadcast channel with three receivers: 

Yk[t] = X[t]+Zk[t], A: = 1,2,3 (8) 

where {Zfc[t]}f, k = 1,2,3, are independent and identically distributed additive vector Gaussian noise processes 
with zero means and positive definite covariance matrices Nfc, respectively. The channel input {X[t]}j is subject 
to a matrix constraint: 

1 

-^X[t]xT[t]^S (9) 

where S is a positive semidefinite matrix, and n is the block length. We assume that the noise covariance matrices 
are ordered as 

Ni ^ N2 ^ N3, (10) 

i.e., the received signal Y^[t\ is (stochastically) degraded with respect to Y2[t], which is further degraded with 
respect to Yi [t] . 

We consider two different communication scenarios, both with two independent messages Wi and W2. In the 
first scenario (see Fig. [Tl-(a)), message Wi is intended for receiver 1 but needs to be kept secret from receivers 
2 and 3, and message W2 is intended for receivers 1 and 2 but needs to be kept confidential from receiver 3. 
In the second scenario (see Fig. [I]-(b)), message Wi is intended for receivers 1 but needs to be kept secret from 
receiver receiver 3, and message W2 is intended for receivers 1 but needs to be kept secret from receiver 3. The 
confidentiality of the messages at the unintended receivers is measured using the normalized information-theoretic 
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criteria [20], [21]: 



-I{Wv,Y^)^0, -I{Wv,Y^) ^ 0, and -I(W2;Y^)^0 (11) 
n n n 

for the first scenario and 

-I{Wi;Y^)^0, and -I{W2;Y^)^0 (12) 
n n 

for the second scenario. Here, the limits are taken as the block length n —>■ oo. The goal is to characterize the 
entire secrecy rate region Cs = {{Ri, R2)} that can be achieved by any coding scheme. 

To characterize the secrecy capacity regions, we will first consider the discrete memoryless version of the problem 
with transition probabiUty p{yi,y2,y3\x) and degradedness order 

X^Yi^Y2^ Y3. (13) 

We have the following single-letter characterizations of the secrecy capacity regions. 

Theorem 3: The secrecy capacity region of the discrete memoryless broadcast channel p{yi,y2,y3\x) with 
confidential messages Wi (intended for receiver 1 but needs to be kept secret from receivers 2 and 3) and W2 
(intended for receivers 1 and 2 but needs to be kept secret from receiver 3) under the degradedness order (IT3] ) is 
given by the set of nonnegative rate pairs {Ri, R2) such that 

Ri<I{X-Yi\U)-I{X-Y2\U) 
and R2<I{U;Y2)-I{U-Y^) (14) 

for some jointly distributed (C/, X) satisfying the Markov relation 

U ^ X ^ {Yi,Y2,Y,,). 

Theorem 4 ([22, Theorem 2]): The secrecy capacity region of the discrete memoryless broadcast channel 2/2, 2/3 
with confidential messages Wi (intended for receiver 1 but needs to be kept secret from receiver 3) and W2 (intended 
for receivers 1 and 2 but needs to be kept secret from receiver 3) under the degradedness order (IT3]) is given by 
the set of nonnegative rate pairs R2) such that 

Ri < I{X;Yi\U) - IiX-Ys\U) 
and R2 < I{U;Y2) - I{U;Y3) (15) 
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for some jointly distributed ([/, X) satisfying the Markov relation 

U ^ X ^ (Yi,Y2,Ys). 

A proof of Theorem m can be found in [22]. Theorem |3] can be proved in a similar fashion; for completeness, a 
proof is included in Appendix H For the vector Gaussian broadcast channel ([8]) under the degradedness order (ITOl ). 
the single-letter expressions (fT4l ) and (fTSl ) can be further evaluated using the extremal entropy inequality (jV]). The 
results are summarized in the following theorems. 

Theorem 5: The secrecy capacity region of the vector Gaussian broadcast channel ([8]) with confidential messages 
Wi (intended for receiver 1 but needs to be kept secret from receivers 2 and 3) and W2 (intended for receivers 1 
and 2 but needs to be kept secret from receiver 3) and degradedness order ([TOl l under the matrix constraint © is 
given by the set of nonnegative secrecy rate pairs (i?i,i?2) such that 



and 



R2< ^log 



B + Ni 



Ni 
S + N2 



2 log 



B + N2 



N2 
S + N3 



B + N, 



(16) 



B + N2 
for some ^ B ^ S. 

Theorem 6: The secrecy capacity region of the vector Gaussian broadcast channel ^ with confidential messages 
Wi (intended for receiver 1 but needs to be kept secret from receiver 3) and W2 (intended for receivers 1 and 2 
but needs to be kept secret from receiver 3) and degradedness order ([TOl l under the matrix constraint Q is given 
by the set of nonnegative secrecy rate pairs , R2 ) such that 



and 



for some ^ B ^ S. 



R2< -log 



B + Ni 



Ni 
S + N2 



B + N2 



- log 

- log 

2 ^ 



B + N3 



N3 
S + N3 



B + N3 



(17) 



III. Proof OF Theorem [I] 

In this section, we prove the generalized Costa EPI Q as stated in Theorem [T] We first examine the equality 
condition. Note that when X is Gaussian, the generalized Costa EPI ^ becomes the matrix inequality: 

|B + A2NA2|^ > |B- AB|Tr + |AB + AN|1 
Suppose that B — AB and B + A2NA2 are proportional, i.e., there exists a real scalar c such that 



1 1 



B + A^NAi = c(B - AB). 
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Since both matrices A and B are symmetric, this implies that AB is also symmetric, i.e.. 



AB = B^A^ = BA 



Therefore, A and B must have the same eigenvector matrix [23] and hence 



AB = A5BA5 



It follows that 



A2BA2 + A2NA2 = B + A2NA2 - (B - AB) 



(c-l)(B- AB) 



1 1 1 



i.e., A2BA2 + A2NA2 and B — AB are proportional. Therefore, 



IB + A2NA5 



1 .1 .1 . 1 . , 1 



|B - AB + (A2BA2 + A2NA2 
|B - AB|" + IA2BA2 + A2NA2I 
IB- ABI" +IAB + ANI". 



This proved the desired equality condition. 

We now turn to the proof of the inequaUty. First consider the special case when |A| = 0. Since 

/i(X + A2Z) - /i(X) =/(A5Z;X + A3Z) > 0, 



we have 



exp 



n 



-/i(X + AiZ) 



> exp 



-/i(X) 



n 



> |I - A| " exp 



-MX) 

n 



where the last inequality follows from the assumption that ^ A ^ I and hence < |I — A| < 1. 

Next, consider the general case when |A| > 0. The proof is rather long so we divide it into several steps. 
Step 1 -Constructing a monotone path. To prove the generalized Costa EPI ©, we can equivalently show that 



exp 



-/i(X + Z) 



n 



< |A| " exp 



n 



-/i(X + A2Z) 



II- A| 



exp 



n 



(18) 
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Since X and Z are independent, we have 



/i(X + AiZ) -/i(X) 



/i(A-iX + Z) - /i(A-2X) 
h{A-2X + Z) - h{A-^X.\Z) 
I(Z; A"2X + Z) 



(19) 



and 



/i(X + Z) - /i(X) = /(Z; X + Z). 



(20) 



Divide both sides of (fTSl ) by exp [|/i(X)] and use (O and Then, ([l8]l can be equivalently written as 



exp 



-/(Z;X + Z) 



n 



< |A| - < exp 



n 



-/(Z; A"iX + Z) 



II- A| 



(21) 



Let 



F(D) := |D|"< exp 



-I(Z;DX + Z) 



II -D 



-2|i 



(22) 



With this definition, (1211) can be equivalently written as 



F{I) < F(A" 



(23) 



To show the inequality (1231 ). it is sufficient to construct a family of n x n positive definite matrices {D(7)}-y 
connecting I and A~2 such that F(D(7)) is monotone along the path. Unlike the scalar case where there is only 
one path connecting 1 to l/^/a, in the matrix case there are infinitely many paths connecting I and A^2. Here, 
we consider the special choice 

D(7) := [l + 7(A-i-I)]' (24) 



and show that 



dF 



>0, V7G[0,1]. 



(25) 



along this particular path. 

Step 2-Calculating the derivative Following [14, Theorem 5], we have 

/(Z; DX + Z) = /(X; DX + Z) + /i(Z) - /i(X) - log |D| 



and 



Cov(X|DX + Z) = Cov(Z|DX + Z)T>~^ . 
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Let N := Cov(Z) and note that D is symmetric. We have 



A/(Z; DX + Z) = ^/(X; DX + Z) - D'^ 

= N^^D Cov(X|DX + Z) - D"^ 
= (N~iCov(Z|DX + Z) - I) 



(26) 



where the second equaUty follows from the fundamental relationship between the derivative of mutual information 
and MMSE estimate in linear vector Gaussian channels as stated in [18, Theorem 2]. 
From (l26l ). the derivative ^ can be calculated as 



aD n 



-I(Z;DX + Z) 



n 



|I-D-2|^ U 



|D| 

2| 
n 



n 



exp 



n 



-/(Z;DX + Z) 



ai(Z; DX + Z) 2 



I_D-2|^(I-D 



n 



-2X-1J3-3 



|D| 



exp 



-/(Z;DX + Z) 



n 



|I_D-2|^ \ l+ 



-/fZ:DX + Zl 



n 



exp 

n 



The derivative ^ can be calculated as 







|exp 


-/(Z;DX + Z) 


n 



(N"^Cov(Z|DX + Z) - I) - |I-D-2|^(d2 -I)-^|d-^ 

N-^Cov(Z|DX + Z) - |I - D-^l " [I + (D^ - I)"^] Id^^ 



aD _ 1 

^7 ~ 2 
1 



[l + 7(A-i-I) 



— D^(d2 - I) 

27 ^ ^ 



1 

27^ 



-Dfl-D 



By (1271), (EHJ) and the chain rule of differentiation [24, Chapter 17.5], 



dF _ r 5D 1 
97 ~ lOD a7j 



(27) 



(28) 



IDI 



n 

, 2 

|D|- 

717 

, 2 

|D|- 

n7 



-Tr< 



exp 



Tr < exp 



exp 



'-I{Z; DX + Z) N"^Cov(Z|DX + Z) - |I - D^^j - _^ (j)2 _ j^-i 

N"^Cov(Z|DX + Z)(I - D-2) _ |i _ D"2| "I 
Tr {N"^Cov(Z|DX + Z)(I - D^^^j _ _ j3-2|i 



I- D 



-2 



-/(Z;DX + Z) 



-/(Z;DX + Z) 

n 



(29) 
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Step 3-Proving ^ > 0. The mutual information /(Z; DX + Z) can be bounded from below as follows: 

/(Z; DX + Z) > /(Z; E[Z|DX + Z]) 

= /i(Z)-/i(Z|E[Z|DX + Z]) 

= ilog(27re)"|N| - /i(Z - E[Z|DX + Z]|E[Z|DX + Z]) 

> ^ log(27re)"|N| - h{Z - E[Z|DX + Z]) 

> ilog(27re)"|N| - ^ log(27re)" |Cov(Z|DX + Z) | 
INI 



2^°^ |Cov(Z|DX + Z) 



Here, the first inequality follows from the Markov relation 



(30) 



Z ^ DX + Z ^ E[Z|DX + Z] 

and the chain rule of mutual information [25, Chapter 2.8]; the second inequality follows from the fact that 
conditioning reduces differential entropy [25, Chapter 9.6]; and the third inequality follows from the well-known 
fact that Gaussian maximizes differential entropy for a given covariance matrix [25, Chapter 9.6]. By (l30b . 



|I-D-2|" exp 



--I(Z;DX + Z) 

n 



< |N-iCov(Z|DX + Z)(I-D 



1 



< -Tr|N"iCov(Z|DX + Z)(I- D-2)| 



(31) 



where the last inequality follows from the well-known inequality of arithmetic and geometric means [26, p. 136] 
Finally, substituting (|3TI ) into ( [29l ) establishes the fact that ^ > for all 7 G [0,1]. In particular, we hav( 
F(D(1)) > F(D(0)). This proved the desired inequality (|2T]) and hence the generaUzed Costa EPI 



IV. Proof OF Theorem [2] 

In this section, we prove the extremal entropy inequality ([7]) as stated in Theorem |2] We will first state a series 
of corollaries of Theorem [U as intermediate results leading to Theorem |2] Based on the final corollary, we will 
prove Theorem |2] using an enhancement argument. 

Corollary 1: Let Z be a Gaussian random n-vector with a positive definite covariance matrix, and let A be an 
n X n positive real symmetric matrix such that ^ A ^ I. Then 



exp 



n 



'-h(X + A2Z\U) 



> II- A| 



exp 



n 



'-h(X\U) 



+ I A| - exp 



n 



'-h(X + Z\U) 



(32) 



for any (X, U) independent of Z. 
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Corollary 2: Let Zi, Z2 and Z3 be Gaussian random n-vectors with positive definite covariance matrices Ni, 
N2 and N3, respectively. Assume that Ni ^ N3. If there exists an n x n positive semidefinite matrix B* such that 

(B* + Ni)-^ + ^(B* + Ns)-! = (1 + ^)(B* + Ns)-^ (33) 

for some real scalar /i > 0, then 

h(X + Zi\U)+ fih(X + ZslU) - (1 + fi)h(X + Z2IC/) 

< ^ log |B* + Nil + ^ log |B* + N3I - log |B* + N2I (34) 

for any (X, U) independent of (Zi, Z2, Z3). 

Corollary 3: Let Z^,, A; = 0, . . . , K, be a collection of K + 1 Gaussian random n-vectors with respective positive 
definite covariance matrices N^.. Assume that Ni ^ . . . ^ ^k- If there exists an n x n positive semidefinite matrix 
B* such that 

K 

^//fc(B* + Nfc)-i = (B*+No)-^ (35) 

fc=i 

for some iJ-k > with J2^=i l^k = 1> then 

K K 

fikh{yi + Zk\U)- /i(X + Zo\U)<Y,Y + Nfcl - 2 log |B* + Nol (36) 

k=l k=l 

for any (X, U) independent of (Zq, . . . , Zk)- 

A proof of Corollaries [T] |2] and |3] can be found in Appendices HIl HI] and |IVl respectively. We are now ready to 
prove Theorem |2] Note that the special case with Mi = M2 = was proved in Corollary [3] To extend the result 
of Corollary [3] to nonzero Mi and M2, we will consider an enhancement argument, which was first introduced by 
Weingarten, Steinberg and Shamai in [7]. 

Let Ni and Nq be n x n real symmetric matrices such that: 

Mi(B* + Ni)-i =/ii(B* + Ni)-i + Mi (37) 
and (B* + No)-^ = (B* + No)-^ + M2. (38) 

As shown in [7, Lemma 11 and 12], Ni and Nq satisfy the following properties: 

-< Ni = (NrV/Li^^Mi)"^ r< Ni, (39) 



Ni ^ No ^ No, 



(40) 
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and 



B* + Ni 




B* + Ni 


Ni 


= 


Ni 


S + No 




S + N2 


B* +No 




B* + N2 



(41) 



(42) 

Let Zq and Zi be two Gaussian n-vectors with covariance matrices Nq and Ni, respectively. Note from (|39l ) that 
Ni ^ Ni N2 ^ . . . ^ Ni^. Moreover, substitute ^ and dMJl into © and we have 

K 

/ii(B* + Ni)-i + Mfc(B* + Nfc)-i = (B* + No)-^ (43) 

k=2 

Thus, by Corollary [3] 

K 

^ilh(X + Zi\U)+ fikh{X + Zk\U) - h(X + Zo\U) 

k=2 

_ K 1 _ 

< /|(B* + Ni)~i + log |B* + Nfcl - - log |B* + No 

k=2 

for any (X, U) independent of (Zq, Zi, Z2, . . . , Z^)- 

On the other hand, note from dm that Ni ^ Ni. We have 

I(X;X + Zi|C/) < /(X;X + Zi|C/) 

for any (X, U) independent of (Zi, Zi). Thus, 

/i(X + Zi|[7) - /i(X + Zi|C/) > /i(Zi) - h{Zi) 

Ni 



- log 

- log 
2 ^ 



Ni 

B* + Ni 
B* + Ni 



(44) 



(45) 



where the last equality follows from (|4TI ). 
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Also note from (|40l) that Nn ~< Nn. Let Zn be a Gaussian n-vector with covariance matrix Nn — Nn and 



independent of (Zq, X, [/). We have 



/i(X + Zo\U) - h(X + Zo\U) = /i(X + Zo + Zol^/) - /i(X + Zo\U) 

= /(Zo;X + Zo + Zo|C/) 

> /(Zo;X + Zo + Zo) 
Cov(X) + No 



> - log 

> -loe 
- 2 ^ 



Cov(X) + No 
S + No 



1 



■log 



S + No 
B* +No 



(46) 
(47) 



I B* + No 

for any (X, [/) independent of (Zo,Zo) such that E[XX^] ^ S. Here, the first inequality follows from the 
independence of Zo and U; the second inequality follows from the worst noise result [27, Lemma II.2]; the 
third inequaUty follows from the fact that Nq ^ No and Cov(X) ^ E[XX^] ^ S; and the last inequality follows 
from (l42l) . 

Finally, put together (l44l ). (l45l) and (l47l) and we may obtain 

K 



k=l 



J2 l^kh{y. + ZfeiC/) - /i(X + Zo|f/) 

//i/i(X + Zi|C/) + ;Ufc/i(X + Zk\U) - h(X + Zo\U) 



k=2 



< 



2 



/i(X + Zi|C/)-/i(X + Zi|C/) - /i(X + Zo|C/) -/i(X + Zo|C/) 

K 



2 



log 



B* + Ni 



B* +Ni 



fc=2 
1 



(B* + Ni)-i + ^ log |B* + Nfcl - - log |B* + Nol 



■log 



B*+Nn 



B* + N 



^ log |B* + Nfcl - ^ log |B* + No 



fc=i 

for any (X, U) independent of (Zo, Zi, . . . , Zk) such that E[XX^] ^ S. This completes the proof of Theorem [2l 

V. Proof OF Theorem [5] 

In this section, we prove Theorem [5] Note that the achievability of the secrecy rate region (fT6l ) can be obtained 
from the secrecy rate region (fT4l) by letting U and V be two independent Gaussian vectors with zero means and 
covariance matrices S — B and B, respectively and X = U + V. We therefore concentrate on the converse part of 
the theorem. 
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To show that ([T6l ) is indeed the secrecy capacity region of the vector Gaussian broadcast channel dS), we will 
consider proof by contradiction. Assume that {R°, R2) is an achievable secrecy rate pair that lies outside the secrecy 
rate region ( fT6l ). Note that Ni ■< N2. From [28, Theorem 1], we can bound R° by 



i?? < - los 



S + Ni 



Ni 



S + N, 



No 



j^rnax 



Note that when = 0, iJf"'' is achievable by letting B = S in (O. Thus, we may assume that R2 > and 
write R1 = R1 + 5 for some 5 > where R\ is given by 



max 

B 



log 



B + Ni 



Ni 



log 



B + N2 



N, 



subject to: ^ B ^ S 



- log 
2 ^ 



S + N2 


loET 


S + N3 


B + N2 


2 ^ 


B + N3 



> R%. 



Let B* be an optimal solution to the above optimization program. Then, B* must satisfy the following KKT 
conditionj^: 



(B* + Ni)-i + /z(B* + Na)-! + Ml = (1 + /i)(B* + Ns)"^ + M2 

B*Mi = 
and (S - B*)M2 = 



(48) 
(49) 
(50) 



where Mi and M2 are n x n positive semidefinite matrices, and ^ is a nonnegative real scalar such that /i > if 
and only if 



log 



S + N2 
B* + N2 



1 



log 



S + N3 

B* +Nq 



Rn. 



Thus, 

i?l + fJ'R2 



- log 
2 ^ 



B* + Ni 
Ni 



- log 
2 ^ 



B* +N2 
No 



- log 
2 ^ 



S + N2 
B* +N2 



- log 
2 ^ 



S + N3 
B* + N3 



+ S. (51) 



'As this optimization program is not convex, a set of constraint qualifications (CQs) should be checked to make sure that the KKT 
conditions indeed hold. The CQs stated in Appendix IV of [7] hold in a trivial manner for this program. 
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On the other hand, by the converse part of Theorem |3] 

Ri + liRl < [I(X;X + Zi\U)-I(X;X + Z2\U)]+fi[I{U;X + Z2) - 
= [h{Z2) - h{Zi)] - ^[/i(X + Z3) - /i(X + Z2)] + 

[h{X + Zi\U)+ fih{X + ZslU) - (1 + fi)h(X + Z2\U)] 

N2 



/(C/;X + Z3 



Ni 



fl[h(X + Z3)-h{X + Z2)] + 



[/i(X + Zi\U)+ fih{X + ZslU) - (1 + fi)h{X + Z2\U)] 

for some jointly distributed (f/, X) independent of (Zi,Z2,Z3). Note that N2 ^ N3. Similar to 
obtain 



(52) 



, we may 



/i(X + Z3) - /i(X + Z2) > ^log 



S + N3 



S + N2 



Moreover, by letting 



1 



Ml 



we can rewrite the KKT conditions (I48]|-(l50ll as 



Ai - Ml ~ M2 

/i3 = rr^, Ml = and M2 



1 + /U' 



1 + M 



(53) 



^i(B* + Ni)-i + ^3(B* + N3)-i + Ml = (B* + N2)-^ + M2 

B*Mi = 
and (S-B*)M2 = 0. 

Thus, by Theorem |2] 

h(X + Zi\U) + fih(X + ZslU) - (1 + //)/i(X + Z2IC/) 

< i log |B* + Nil + I log |B* + N3I - log |B* + N2I. 

Substituting (l53l ) and (l54l ) into (l52l ). we have 



N2 




S + N3 


Ni 




S + N2 



+ 



^ log |B* + Nil + I log |B* + N3I - log |B* + N2I 



1 



log 



B* +Ni 


— - log 


B* + N2 






- log 


S + N2 


— - log 


S + N3 1 


Ni 


2 ^ 


N2 




2 ^ 


B* + N2 


2 ^ 


B* + N3 _ 



(54) 



(55) 



Thus, we have obtained a contradiction between (ISTb and (l55l) . As a result, all the achievable rate pairs must be 
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inside the secrecy rate region ( fT6l ). This completes the proof of the theorem. 



VI. Proof OF Theorem [6] 

In this section, we prove Theorem |6] following similar steps as those used in the proof for Theorem [5] The 
achievability of the secrecy rate region ([TV] ) can be obtained from the secrecy rate region (fT5l) by letting U and 
V be two independent Gaussian vectors with zero means and covariance matrices S — B and B, respectively and 
X = U + V. We therefore concentrate on the converse part of the theorem. 

To show that ([TT] ) is indeed the secrecy capacity region of the vector Gaussian broadcast channel ([8]), we will 
use proof by contradiction. Assume that (i?f,i?2) is an achievable secrecy rate pair that lies outside the secrecy 
rate region ( [TT] ). Note that Ni ^ N3. From [28, Theorem 1], we can bound R° by 



S + Ni 



Ni 



- log 

2 ^ 



S + N, 



Note that when R2 = 0, R^"-^ is achievable by letting B = S in (fTSl ). Thus, we may assume that R2 > Q and 
write R1 = R\ + 5 for some J > where R[ is given by 



max 

B 



log 



B + Ni 



Ni 



log 



B + N3 



subject to: ^ B ^ S 

S + N2 



— loEr 
2 ^ 



- log 
2 ^ 



S + N3 



B + N3 



> R%. 



B + N2 

Let B* be an optimal solution to the above optimization program. Then, B* must satisfy the following KKT 
conditions: 



(B* + Ni)-i + (/i - 1)(B* + N3)-i + Ml = ^(B* + Na)-^ + M2 

B*Mi = 
and (S - B*)M2 = 



where Mi and M2 are n x n positive semidefinite matrices, and ^ is a nonnegative real scalar such that /i > 
Therefore, 

^2 = ^ log 

and 



(56) 
(57) 
(58) 



S + N2 


-^log 


S + Ns 


B* +N2 




B* +N3 



Rl + ^xR'i 



B* + Ni 



Ni 



^log 



B* +N3 



N3 



^log 



S + N2 



B* +N2 



- log 

2 ^ 



S + N3 



B* + N3 



+ 5. (59) 



^If /i < 1, it is easy to see tliat B* = S is an optimal solution and lience contradicts the assumption that R2 > 0. 



On the other hand, by the converse part of Theorem |4] 
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Ri + Mi?2 < X + Zi|C/) - /(X; X + ZslU)] + fi[I{U; X + Z2) - I{U; X + Z3) 
= [hiZs) - /i(Zi)] - ^[/i(X + Z3) - /i(X + Z2)] + 

[/i(X + Zil^J) + - l)/i(X + ZslU) - fih(X + Z2|[/)] 



<\lo, 



N3 


-^log 


S + N3 


Ni 


2 ^ 


S + N2 



[/i(X + Zil^J) + - l)/i(X + Z3|[/) - ^/i(X + Z2\U)] 



(60) 



for some jointly distributed (J7, X) independent of (Zi, Z2, Z3), where the last inequality follows from 
Since /i > 1, by letting 

1 ~ Ml , ~ M2 
Ml = = , Ml = , and M2 = 

/i /i /X fx 

we can rewrite the KKT conditions (l56l)-(l58]) as 



^i(B* + Ni)-i + /i3(B* + N3)-i + Ml = (B* + Ns)-^ + M2 

B*Mi = 
and (S -B*)M2 = 0. 



Thus, by Theorem |2] 



h(X + Zi\U) + {fi- l)h{X + Z3\U) - fih{X + Z2\U) 



< ^log|B*+Ni| + -2 



log|B* + N3| -^log|B*+N2| 



(61) 



Substituting (1541 ) into (1601 ). we have 



N3 


-I- 


S + N3 


Ni 




S + N2 



+ 



^ log |B* + Nil + log |B* + N3I - I log |B* + N2I 



^log 



B* +Ni 



Ni 



- log 

2 ^ 



B* + N3 



N3 



- log 

2 ^ 



S + N2 



B* + N2 



^log 



S + N3 



B* + N3 



(62) 



Thus, we have obtained a contradiction between (|59l ) and ( |62l ). As a result, all the achievable rate pairs must be 
inside the secrecy rate region ( fTTl ). This completes the proof of the theorem. 
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VII. Conclusions 

This paper has considered an EPI of Costa and has established a natural generaUzation by replacing the scalar 
parameter in the original Costa EPI with a matrix one. The generalized Costa EPI has been proven using a 
perturbation approach via a fundamental relationship between the derivative of mutual information and the MMSE 
in linear vector Gaussian channels. This is an example of how the connections between information theory and 
statistics can be explored to provide new mathematical tools for information theory. 

As an application, a new extremal entropy inequality has been derived from the generalized Costa EPI and then 
used to characterize the secrecy capacity regions of the degraded vector Gaussian broadcast channel problem with 
layered confidential messages. We expect that the generalized Costa EPI will also play important roles in solving 
some other Gaussian network communication problems. 

Appendix I 
Proof of Theorem [3] 

A. Achievability 

We first show that the secrecy rate region ([141 ) is achievable. Following the idea of superposition coding for the 
degraded broadcast channel [3], we introduce an auxiliary codebook which can be distinguished by both receiver 
1 and receiver 2. The codebook is generated using random binning [20], [21]. 

Fix p{u) and p{x\u) and let 

R!^ = I{X-Y2\U)-ei (63a) 
and = I{U; Y3) - ei (63b) 

for some ei > 0. Let 

Lfc = 2"«S Jfc = 2"^^- and, n = L^J^ k = 1,2. 

Without loss of generality, L^., L^. and Jj^ are assumed to be integers. 

Codebook generation: Generate T2 independent codewords of length n according to W^^iP{ui) and label 
them as 

u"'{w2,j2), -^2 e {1, • • • ,-^^2}, i2 G {l,•••,./2}■ 
For each codeword u"'{w2,j2), generate Ti independent codewords according to riiLi^'l^il^*) label them 
as 

x'^{wuji^W2,i2) = x^{wi,ji,u^{w2,j2)), Wfc G {1, . . . , Lfc} and j'fc G {1, . . . , Jfc}. 
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Encoding: To send a message pair {wi,W2), the transmitter randomly chooses a pair (^1,^2) and sends the 
corresponding codeword x"'{wi,ji,W2,j2) through the channel. 
Decoding: Receiver 2 determines the unique W2 such that 

where Ai^^ {pu,Y2) denotes the set of jointly typical sequences ■u" and 7/2 with respect to p{u, 2/2)- If there are none 
such or more than one such, an error is declared. Receiver 1 looks for the unique {wi,W2) such that 

{w2 , j2 ) , a;" (wi , ji ,w2,j2),yi) e A'f'^ {pu,x,y, ) 

where A^r\pu,x,Yi) denotes the set of jointly typical sequences -u", and yj* with respect to Otherwise, 
an error is declared. 

Error probability analysis: By the symmetry of the codebook generation, the probability error does not depend 
on which codeword was sent. Hence, without loss of generality, we may assume that the transmitter sends the 
message pair {w\,W2) = (1, 1) associated with the codeword 1, 1, 1) and define the corresponding event 

/Ci := 1, 1, 1) was sent}. 

First consider the decoding at receiver 2, for which we will show that receiver 2 is able to decode u"'{w2,j2) 
with small probability of error if R2 + R2 < I{U ; Y2). To prove this, define the event 

£2{w2,j2) ■.= {{u-{w2,j2),y^) e . 
Then, the probability of error at receiver 2 can be bounded from above as 

p(?<Pr|n^^l(l,j2)|/Cii+ M^2(«;2,i2)|/Ci} 
<Pr{^|(l,l)|/Ci}+ Yl M£2{w2,j2)\JCi} 

where 

£^{l,j2) :={{u-{l,j2),y^) i At\pu,Y.)] . 
For large enough n and i?2 + ^2 < -^(^ \ ^2), the joint asymptotic equipartition property (AEP) [25, Chapter 14.2] 
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implies 

< 2e. (64) 

Next, we will show that receiver 1 can successfully decode both n" and if 

Ri + R!^ < IiX;Yi\U) 
and R2 + R'2 < I{U;Y2). (65) 

Define the events 

£l,l{^lJl,W2,j2) ■■= \^{u''i'W2,j2),x'^{wiJl,W2,j2),yi) G •^i'^\pU,X,Yi)} ■ 



and 



where A^J^\pu,Yi) denotes the set of jointly typical sequences and with respect to p{u,yi). Then, the 
probability of error 

P^^ <Ft{£',{1,1)\IC,}+ Pr{£,{w2j2)\ICi} + Pr{^^i,iK, ii, 1, l)l^i} 

where 

:={(u'^(l,l),yr)M^"^(P^,yj}- 

By the AEP [25, Chapter 14.2], 

Pr{ff(l,l)|/Ci}<6, 
Pr{£:i(u;2,j2)|/Ci} < 2-^^'^™-'\ for W2 + 1, 
and Pr{£:i,iK,ji,l,l)|/Ci} < 2-"[^(^'^^l^)-^], for / 1. 

Since the channel is degraded, we have 1(\J\Y\) > I{U\Y2)- Hence, if n is large enough and the condition (1651 ) 
holds, the probability of error at receiver 1 can be bounded from above as 

< 3e. (66) 
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Together, (|64l ) and ( [66l ) illustrate that messages (?i;i,W2) can be decoded at receiver 1 with a total probability of 
error that goes to as long as the rate pair (i?i,i?2) satisfies ([T4l ). 

Equivocation calculation: To show that (fTTl ) holds, we consider the following lower bound on the equivocation: 

H{Wi\Y^) > H{Wi\Yi',U'') 

= i7(X",y2"|^7") - H{X''\Wi,Y^,U'^) - HiY^lU'') 

= H{X''\U'') + H{Y^\X'',U'') - H{X'^\Wi,Y^,U'') - HiY^lU"") 

= -/7(x"|Tyi,y2",f^") (67) 

where the second equality is due to the fact that Wi is independent of everything else given X^. 

According to the codebook generation, for a given [/" = u", X" has Ti possible values with equal probabilities. 
Hence, 

H{X^\U'') = n{Ri+R[) 

= n[Ri+I{X;Y2\U)-ei] (68) 

where (1681 ) follows from the definition of R[ in (I63ab . 

Next, we show that for any given 62 > 0, H{X"'\Wi,Y2,U'^) < ne2 for large enough n. To calculate 
H{X^\Wi,Y.p,U^), consider the following hypothetical scenario. Fix Wi = wi, and assume that the transmitter 
sends a codeword x^[wi,ji,u'^{w2,j2)), ji G {^t ■ ■ ^Ji}- Assume that receiver 2 knows the sequence [/" = 
u^{w2,j2)- Given index Wi = wi, receiver 2 decodes the codeword ^"(wi, ji, u") (i.e., looks for the index ji) 
based on the received sequence y2- Let X{wi) denote the average probability of error of decoding the index ji at 
receiver 2. By the AEP [25, Chapter 14.2], we have \{wi) < e for sufficiently large n. By Fano's inequality [25, 
Chapter 2.11], 

-h{x''\Wi = wi,y:^,u'^) <- + \{wiy^^^ 

n n n 

1 
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Consequently, 

-H(X''\Wi,Y^,U'') = - V Ft(Wi = wi)H(X''\Wi = wur^^W") 
n n '-^ 

< £2- (69) 

By the AEP [25, Chapter 14.2], for any £3 

/(X";y2"|f/") < n/(X;y2|f/) +ne3 (70) 

for sufficiently large n. Substituting (1681 ). (|69l ) and (ITOb into (l67l) . we have 

ii/(VFi|y2") >^i-(ei + e2 + e3). 
n 

Similarly, we can show that 
where 

i/(C/") =n[i?2 + /(t/;l'3)-ei] 
i7(C/"|VF2,n") <™4 
and /(C/";y3")<n[/(C/;y3) + e3]> 

where €3 and 63 vanishes in the Umit as n — > 00. Hence, 

1 



n 



^(W2|y3") >R2- (ei + £2 + 4) 



Note that y3 is degraded with respect to y2. Therefore, 

^(VFiiyf) > H{W^\Y^,Y^) 
= H{Wi\Y2^) 
> fli - (ei + 62 + €3). 

This proves the security condition ([TT]) and hence the achievability part of the theorem. 
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B. The Converse 

We first bound from above the secrecy rate The perfect secrecy condition (fTTl ) implies that for all e > 0, 

H{Wi\Y:^) > H{Wi) - ne (71a) 
and H{W2\Y^) > H{W2) - ne. (71b) 

On the other hand, Fano's inequality [25, Chapter 2.11] implies that for any eo > 0, 

H(Wi\Y^^) < eo log (2"^^ - l) + h{eo) := n5i ills.) 
and H{W2\Y^) < eo log (2"^'^ - l) + h{eo) := nJs- (72b) 

Thus, 

nRi = H{Wi) 

< [H{Wi\Y^'') + ne] + [n5i - HiWilYl")] 

< HiWuW2\Y^) - H{Wi\Y^, W2) + n(e + 5i) 

< H{Wi\Y:^, W2) - H{Wi\Y^, W2) + n(e + 5i + ^2) (73) 

where the first inequality follows from (17 lab and (I72al ). and the last inequality follows from (I72bl ). Let 5 = €+81+82- 
By the chain rule of the mutual information [25, Chapter 2.5], 

n{Ri -8)< I{Wi]Y^\W2) - I{Wi]Y^\W2) 

n 

i=\ 
n 

= [/(H^i;yi,,|Ty2,l^i:.+i,1^2^-') -/(TVi;y2,.|W^2,n:^+i,>^2~')] (74) 
where the last equality follows from [21, Lemma 7]. Let 

:=(y/:,+i,yr')- (75) 
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We can further bound ( 1741 ) from above as 

n 

n{Ri -S)<Y^ [I{Wi,Xf,Yi^i\W2,V,) - I{Wi, Xf,Y2,i\W2,Vi)] 
1=1 

n 

- ; \Wi,W2,Vi) - I{Xi- y2,^ \Wi,W2, Vi)] 

1=1 

n 

<Y,[I{Wi,Xf,Yi,i\W2,V,) - I{Wi,Xf,Y2,i\W2,Vi)] 
1=1 

n 

= [I{Xf,Yi^i\W2,Vi) - I{Xi-Y2,^\W2,Vi)] (76) 

i=l 

where the second inequaUty follows from the Markov relation 

{Wi,W2,Vi)^X,^Yi,^^Y2,i, 

and the last equality is due to the fact that Yi j and l2,« are conditionally independent of everything else given Xj. 
Next, we bound from above the secrecy rate R2. By (|71b| ) and (|72b| ). 

nR2 = H{W2) 

< [H{W2\Y;^) + ne] + [n52 - HiW2\Y^)] 
= I{W2; 1?) - 7(1^2; IT) + n{e + ^2) 

n 

= J2 [HW2; Y2,i\Y^,i+i) - I{W2; Ys,i\Y^-^)] + n(e + ,52). (77) 



i=l 



Let S' := e + 62 and 



Vr.= {Y2:,+i,Y^~') . (78) 
Applying [21, Lemma 7] again, we may obtain 

n 

n{R2 -5')<Y. [^(^2; Y2AVI) - I{W2; Y^^VD] 
1=1 

n n 

= Y [HW2, V^; Y2,.) - I{W2, Vl- n,.)] - [^(^/; - I{Vl-, 5^3,^)] 

i=l i=l 

n 

< Y [1(^2, Vl- Y2,^) - I{W2, Vl- y3,i)] (79) 
i=l 

where the last inequality follows from the Markov relation Vl Yi^i — > 12,1- Furthermore, by the definitions of 
Vi and Vl in dTSl ) and (178] ) respectively, 

Vl ^{W2,Vi)^{Y2,i,Y^,D. (80) 
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By (1791) and (gOl), 

n n 

n(i?2 -S')<Y1 [^(^2, -t^i; 5^2,*) - /(W2, V/, Vf, ^3,0] - 5Z [l{Vi; Y2AW2, Vl) - I{V,; Y^^i\W2, ¥()] 

1=1 i=l 
n n 

= [HW2, Vf, - I{W2, Vi- %)] - [^(^^' ^2,^1^2, Vl) - I{Vi- %|VF2, VD] . (81) 

i=l i=l 

Note that 13 j is conditionally independent of everything else given ¥2^%. Hence, 

I{Vi;Y^,i\W2X) < I{Vi;Y2,i,Ys,i\W2,Vl) 

= m; Y2,i\W2, Vl) + liV; Ys,i\Y2,,, W2, Vl) 

= I{V;Y2,i\W2,Vl). (82) 



Substituting into dSTjl, we have 

1 " 

i?2 < - V [HW2, Vi-Y2,i) - I{W2, F,; Ys,,)] + <5'. 

i=l 

Finally, let 

:= {W2,Vi). 

With this definition, (1761 ) and (1831 ) can be rewritten as 

1 " 

Ri < - y^[I{Xi-Yi^,\Ui) - I{Xi-Y2AUi)]+ 5. 
1=1 

1 " 

and ii2 < - V [/(C/i; Y2,i) - I{Ui;Y^,,)] + 5'. 



(83) 



(84) 



(85) 



i=l 



Following the standard single-letterization process (e.g., see 125, Chapter 14.3]), we have the desired converse 
result. 

Appendix 11 
Proof of Corollary □ 

Fix [/ = li. By the generalized Costa EPl we have 



n . 
2 









1 I - A " exp 


-/i(X|C/ = u) 


+ A " exp 




n 





n 



'-h(X + Z\U = u) 



(86) 
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Taking expectation over U on both sides of ( 1861) . we may obtain 



n . 



/i(X + AiZ|[/) > -E 



log < |I — A| " exp 



-/i(X|[7 = «) 



n 



n 



n 



-E[/i(X|C/ = u) 



+ I A| " exp 



+ I A| 1 exp 



-/i(X + Z|C/ = 'u 

n 

'^E[/i(X + Z|C/ = n)] 



n 



n 



-log<j |I- A|" exp 



-/i(X|C/) 



n 



+ I A| " exp 



-h{y. + Z\U) 



n 



(87) 



where the second inequality follows from Jensen's inequality [25, Chapter 2.6] and the convexity of log (ai e^^ + 026^^ ) 
in {xi,X2) for ai,a2 > 0. Taking logarithm on both sides of (l87l) proves the desired inequality (l32l) . 

Appendix III 
Proof of Corollary [2] 

Note that when /U = 0, ( [33] ) implies that Ni = N2. Thus, both sides of ( [34] ) are equal to zero and the inequality 
holds trivially with an equality. For the rest of the proof, we will assume that ^ > 0. The proof is rather long so 
we divide it into several steps. 

Step 1-Generalized eigenvalue decomposition. We start by applying generalized eigenvalue decomposition [23] 
to the positive define matrices B* + Ni and B* + N2. There exists an invertible generalized eigenvector matrix V 
such that 



VT(B* +Ni)V = Ai 
and VT(B* + N2)V = A2 

where Ai and A2 are positive definite diagonal matrices. Let 

A3 := V^(B* + N3)V 



(88) 
(89) 



(90) 



be an n X n positive definite matrix. By (1331 ). 



Ar^ + Ms'' = (1 + ^)A2-^ 



(91) 



Thus, A3 is also diagonal. Moreover, since Ni ■< N3, 



Ai - A3 = V"^(Ni - N3)V < 0. 



and hence 



Ai :< As- 



(92) 
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Step 2-Choosing matrix parameter A. Let A3 = A3 + el for some e > 0, and let A2 be an n x n matrix such 
that 

A^i + ^3 1 = (1 + //)A^\ (93) 
Clearly, A2 is diagonal. Moreover, by ( |92l ) 

Ai ^ A3. (94) 

Note that // > so by (|93l) and (|94l) 

Ai -< A2 ^ A3. (95) 
Comparing W\\ and ( [93] ) and using the fact that A3 -< A3, we have 

A2 ^ A2. (96) 

Now let 

Yi := VT(X + Zi) 
Y2 := VT(X + Z2) 
and Y3 := V"^(X + Z3) 

where Z2 and Z3 are Gaussian n-vectors with covariance matrices 

N2 = V""^A2V"^ - B* 
>- V""^A2V~^ - B* 
= (B* + N2) -B* 

= N2 

and 

N3 = V""^A3V"^ - B* 

= V-T(A3 + eI)V-^ - B* 

= (B* + N3 + eV""^V"^) - B* 

= N3 + eV-Ty-i 
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respectively and are independent of X. The covariance matrices of , k = 1,2,3, can be calculated as [Cov (X) — 
B*]V + Ai, VT[Cov(X) - B*]V + A2 and VT[Cov(X) - B*]V + A3, respectively. Thus, Y2 and Y3 can be 
equivalently written as 

Y3 = Yi + Z 
and Y2 = Yi + A5Z 

where Z is a Gaussian n-vector with covariance matrix A3 — Ai ;^ and is independent of Yi, and 



A:=(A2-Ai)(A3-Ai)-\ 



(97) 



Clearly, A is diagonal. Moreover, by ( |95l ) -< A -< I. 

Step 3-Applying generalized Costa's EPI. By the generalized Costa EPI (|3]), 



n 



HY2\U) > -log<! |I- A|-exp 



n 



-/i(Yi|C/) 



+ I A| " exp 



n 



-/i(Y3|C/) 



Thus, 



h(Yi\U) + fihiY3\U) - (1 + /u)/i(Y2|[/) 

< /i(Yi|f/) + fihiYslU) - il±^ log <! |I - A|^ exp 



Now we consider the function 



n 



-/i(Yi|C/) 



+ I A| " exp 



n 



-/i(Y3|C/) 



(98) 



f{b, c) = b + nc log 



,^ ,'2b\ /2c 

I — A " exp — + A " exp — 
n J \ n 



Note that 



V/(6,c) 



|I- A|-exp(26/n) 



I — A|~ exp(26/n) + |A|~ exp(2c/n) 

I A| ~ exp(2c/n) 
|I - A|" exp(26/n) + |A| « exp(2c/n) 



and 



vV(6,c) 



2(1 + |A|-|I- A|-exp[(26 + 2c)/n] 



n 



1 -1 
-1 1 



-< 0. 



|I — A| " exp(26/n) + | A| " exp(2c/n) 
So f{b,c) is concave in (6, c). By setting Vf{b,c) = 0, the global maximum is achieved when 

II- AI 



n 



c = 6 + - log 
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and the maximum is given by 



II- A| 



(1 +^)n 



log 



(1 + ^)|I-A| 



Hence, 



h(Yi\U)+fih{Y3\U) - (1 + ls)h{Y2\U) 



^ ^^^^ 



II- A| 



(1 + /i)n 



log 



(1 + /.)|I-A|- 



(99) 



5fep 4-Calculating log | A| a«<i log |I — A|. Note that ( 1931 ) can be rewritten as 



MAr'-A3-i) = (i + /.)(Ar^-A2-^) 



1 A -1^ 



which gives 



A2 - Ai 



A3-A1 



1 + 



A2 
A3 



(100) 



Similarly, we have 



and hence 



(Ari-A3i) = (l + /.)(A2-i-A3i) 



A3 - A2 



A3 - Ai 



1 + Ai 



(101) 



According to the definition of A in ( 1971 ). 



log I A| = loe 



log 



A2 - Ai 



A3 - Ai 



An 



(102) 



and 



log |I — A| = loe 



log 



A3 - A2 



A3 - Ai 
1 



1 + /U 



Ai 



(103) 



where (1102! ) and (1103b follow (1 100b and (llOlb . respectively. Substituting (1102b and (il03b into (|99l), we have 



MYi|[/)+A^/i(Y3|C/)-(l + A^)MY2|C/) < ^log|Ai| + |log|A3| -^log|A2|. (104) 
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Step 5-Letting e j 0. Note that A3 = A3 + el ^ A3 and N3 = N3 + eV^^v-i ^ N3 in the Umit as e j 0. 
Moreover, by (|93] | we have A2 A2 and hence 

N2 = V""^A2V"^ - B* 
^ V""^A2V"^ - B* 
= (B* + N2) -B* 

= N2. 

Letting e J, on both sides of (11041 ). we have 



/i(vT(X + Ni)|C/) + fih{V^{X + N3)|C/)-(l + fi)hiV'^(X + N2)|[/) 

< i log |Ai| + I log IA3I - log IA2I. (105) 

Using the fact that 

/i(VT(X + Ni)|C/) = /i(X + Ni|C/) + log |V| 

and 

log|Afc| =log|V'^(B* + Nfc)V| 

= log|B* +Nfc| +21og|V| 

for /c = 1, 2, 3, the desired inequality (l34l ) can be obtained from (I105I ). This completes the proof of the corollary. 

Appendix IV 
Proof of Corollary [3] 

Here, we prove Corollary |3] using mathematical induction. Note that when K = I, (1351) implies that Ni = Nq. 
Thus, the inequality (l36l ) holds trivially with equality for any (f/, X) independent of (Zq, Zi). 

Assume that the inequality (l36l ) holds for = Q — 1. Let N be an n x n symmetric matrix such that 

Q-l 

(B*+N)-i = ^^'fc(B*+Nfc)"i (106) 

k=l 

where 
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By the assumption Ni ^ . . . ^ Nq-i> we have from ( 11061 ) 

Nir<N^NQ_i. (107) 

Let Z be a Gaussian random n-vector with covariance matrix N and independent of (t/, X). By the induction 
assumption and (11061 ). 

Q-i Q-i , , 

^ //',/i(X + Zfc|[7) - /i(X + Z|C/) < ^ log |B + Nfcl - - log |B + N|. (108) 

k=l k=l 

On the other hand, substitute (1106b into (l35l) and we have 

(B + N)-i + /z'q(B + Nq)-i = (1 + /.^)(B + No)-^ 

Note from (fTOTl) that N ^ Nq_i ^ Nq. Thus, by Corollary [2] 

MX + Z|C/) + /.'q/i(X + Zq|[/) - (1 + ^'q)/i(X + Zo|t/) 

< ^ log |B + N| + ^ log |B + NqI - log |B + No|. (109) 



Putting together (11081 ) and ( I109K we have 

u„/7,('x + z„ir/^ - h(x + Znim < V ^ 



+ Zi|t/) - h{X + Zo\U)<Y^ log |B + N,| - - log |B + No| 



This proved the induction step and hence the corollary. 
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